
<!DOCTYPE html>
<html>

<head>
	<meta charset="utf-8">
	<meta name="generator" content="Hugo 0.88.1" />
	<meta name="viewport" content="width=device-width, initial-scale=1">
	<link href="https://fonts.googleapis.com/css?family=Roboto:300,400,700" rel="stylesheet" type="text/css">
	<link rel="stylesheet" href=""https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css">
	<link rel="stylesheet" href="css/custom.css">
	<link rel="stylesheet" href="css/normalize.css">

	<title>Spark-TTS</title>
	<link href="css/bootstrap.min.css" rel="stylesheet">

</head>


<body data-new-gr-c-s-check-loaded="14.1091.0" data-gr-ext-installed="">

<div class="container" >
<header role="banner">
</header>
<main role="main">
<article itemscope itemtype="https://schema.org/BlogPosting">

<div class="container pt-5 mt-5 shadow-lg p-5 mb-5 bg-white rounded">
	<div class="text-center">
	<h2>Spark-TTS 研究结论</h2>
      <p>出门问问(Mobvoi)联合香港科技大学、上海交通大学、南洋理工大学、西北工业大学等顶尖学术机构,共同开源新一代语音生成模型Spark-TTS.</p>
  </div>

	<p>
	<b>优点：</b>
      <ul>
				<li><p>声音复刻仅需很短的参考音频</p></li>
				<li><p>声音复刻还原度高，声音与参考音频很相似</p></li>     
      </ul>
  <b>存在问题：</b>
      <ul>
				<li><p>声音生成用时较长，10个字以内的文本生成音频，就需要5秒以上，相同的文本在GPT-Sovits上仅需1秒</p></li>
				<li><p>概率性出现生成音频卡住，一分钟以后才生成完成，且生成的音频近一分钟，仅前面几秒有内容，后面大段都是空白，或全一分钟都是空白</p></li>
		    <li><p>生成的音频句和句之间的间隔比较大，甚至到2-3秒。</p></li>
		    <li><p>显存占用高：9G左右，GPT-Sovits在4G左右</p></li>
      </ul>
  <b>解决办法：</b>
  		<p>Spark-TTS的生成效果不错，但由于Spark-TTS生成音频时间较长，且存在上述不稳定的状况，很难运用在对时间要求较高的对话场景，故我<strong>使用GPT-Sovits训练了数十条由Spark-TTS生成的音频</strong>，复刻哪吒的声音，由于样本的都是由Spark-TTS生成的，一致性高，且没什么杂音，所以这次训练出来的模型，声音复刻的效果感觉还可以。</p>
  <b>Spark-TTS 与 GPT-SoVits 对比 见最下方的表格，示例音频可听</b>
	</p>

	
</div>


<div class="container pt-5 mt-5 shadow-lg p-5 mb-5 bg-white rounded">		
	<h2 id="zero-shot-inference" style="text-align: center;">Spark-TTS 出现的问题</h2>
	<body>
	<p style="text-align: center;">
		<img src="figures/bug1.png" height="400" width="1000">
	</p>
	</body>
		<p style="text-align: center;" >
			<b>Figure 1.</b> 出现问题：生成的音频近一分钟，仅前面几秒有内容，后面大段都是空白.
		</p>
		<p style="text-align: center;">
		<img src="figures/bug2.png" height="400" width="1000">
	</p>
	</body>
		<p style="text-align: center;" >
			<b>Figure 2.</b> 出现问题：生成的音频近一分钟，一分钟都是空白.
		</p>
		<body>
	<p style="text-align: center;">
		<img src="figures/bug3.png" height="400" width="1000">
	</p>
	</body>
		<p style="text-align: center;" >
			<b>Figure 3.</b> 出现问题：生成的音频句和句之间的间隔比较大.
		</p>
</div>


<div class="container pt-5 mt-5 shadow-lg p-5 mb-5 bg-white rounded">
	<h2 id="coarse-grained_control" style="text-align: center;">Spark-TTS 与 GPT-SoVits 对比</h2>
		<div class="table-responsive pt-3">
			<table class="table table-hover pt-2">
				<thead>
				<tr>
				<th style="vertical-align : middle;text-align: center">参考音频 </th>
				<th style="vertical-align : middle;text-align: center">文本 </th>
				<th style="vertical-align : middle;text-align: center"> Spark-TTS </th>
				<th style="vertical-align : middle;text-align: center"> GPT-SoVits </th>
				</tr>
				</thead>
				<tbody>
					<tr>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 230px;"><source src="audios/zero-shot/nezhai_promptvn.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;">床前明月光，疑是地上霜，举头望明月，低头思故乡。这我可太熟了，我还知道好多诗呢，你还要听听吗？</td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/sparktts/20250310131010.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/gptsovits/20250312181225833.mp3" autoplay/>Your browser does not support the audio element.</audio></td>
					</tr>
					<tr>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 230px;"><source src="audios/zero-shot/nezhai_promptvn.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;">对，这就是我，万人敬仰的太乙真人，虽然有点婴儿肥，但也掩不住我逼人的帅气。</td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/sparktts/20250310131127.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/gptsovits/20250312181227913.mp3" autoplay/>Your browser does not support the audio element.</audio></td>
					</tr>
					<tr>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 230px;"><source src="audios/zero-shot/nezhai_promptvn.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;">若前方无路，我便踏出一条路；若天理不容，我便逆转这乾坤。</td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/sparktts/20250310131221.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/gptsovits/20250312181229732.mp3" autoplay/>Your browser does not support the audio element.</audio></td>
					</tr>
					<tr>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 230px;"><source src="audios/zero-shot/nezhai_promptvn.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;">别人的看法都是狗屁，你是谁只有你自己说了才算。</td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/sparktts/20250310131303.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/gptsovits/20250312181231091.mp3" autoplay/>Your browser does not support the audio element.</audio></td>
					</tr>
					<tr>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 230px;"><source src="audios/zero-shot/nezhai_promptvn.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;">小爷是魔，那又如何？</td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/sparktts/20250310131351.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/gptsovits/20250312181232027.mp3" autoplay/>Your browser does not support the audio element.</audio></td>
					</tr>
					<tr>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 230px;"><source src="audios/zero-shot/nezhai_promptvn.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;">曾经有一份真挚的爱情摆在我的面前，我没有珍惜，等到失去的时候才追悔莫及。</td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/sparktts/20250310134414.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/gptsovits/20250312181233993.mp3" autoplay/>Your browser does not support the audio element.</audio></td>
					</tr>
					<tr>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 230px;"><source src="audios/zero-shot/nezhai_promptvn.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;">他们说我注定不配拥有什么，注定要背负那些沉重的枷锁，活在他们的期望里。每天都得做那些他们安排好的事，像个没有自由的傀儡，根本没法活成自己。这个世界，这些人，真的一点也不值得我去在乎。</td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/sparktts/20250310134737.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/gptsovits/20250312181237514.mp3" autoplay/>Your browser does not support the audio element.</audio></td>
					</tr>
					<tr>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 230px;"><source src="audios/zero-shot/nezhai_promptvn.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;">你可知为什么他们都不愿意和你合作？因为你太聪明了，聪明到让他们觉得自己毫无优势。他们害怕与你站在同一起跑线，怕被你的智慧甩得远远的。其实，真正让他们害怕的，不是你的聪明，而是他们自己内心的自卑和不自信。</td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/sparktts/20250310134959.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/gptsovits/20250312181241459.mp3" autoplay/>Your browser does not support the audio element.</audio></td>
					</tr>
					<tr>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 230px;"><source src="audios/zero-shot/nezhai_promptvn.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;">我们正在不断打破技术的边界，打造更加智能、更加人性化的产品。未来的智能手机、智能设备，将不再仅仅是工具，它们将成为我们生活的一部分，帮助我们更加高效、更加便捷地与世界连接。</td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/sparktts/20250310135111.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/gptsovits/20250312181245006.mp3" autoplay/>Your browser does not support the audio element.</audio></td>
					</tr>
					<tr>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 230px;"><source src="audios/zero-shot/nezhai_promptvn.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;">所以我觉得好的技术，它一定是有温度的，能够真正服务于大家，而不是冷冰冰地摆在那里。就像现在的语音合成技术，越来越自然，越来越贴近人们的需求，这种发展是有生命力的。它就跟当年那些突破性的科技一样，一开始可能觉得很新鲜，但慢慢地，它就融入生活，变成大家离不开的一部分了。</td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/sparktts/20250310135321.wav" autoplay/>Your browser does not support the audio element.</audio></td>
						<td style="vertical-align : middle;text-align:center;"><audio controls="controls" style="width: 250px;"><source src="audios/zero-shot/gptsovits/20250312181250334.mp3" autoplay/>Your browser does not support the audio element.</audio></td>
					</tr>
					
					
				</tbody>
			</table>
		</div>
</div>

</article>
</main>
</div>

</body>
</html>